Remote sensing via ^-minimization 

Max Hiigel* Holger Rauhut^ Thomas Strob-iner* 
May 4, 2012; revised April 24, 2013 



Abstract 

We consider the problem of detecting the locations of targets in the far held by sending 
probing signals from an antenna array and recording the reflected echoes. Drawing on key 
concepts from the area of compressive sensing, we use an ^i-based regularization approach 
to solve this, in general ill-posed, inverse scattering problem. As common in compressive 
sensing, we exploit randomness, which in this context comes from choosing the antenna 
locations at random. With n antennas we obtain n 2 measurements of a vector x G C N 
representing the target locations and reflectivities on a discretized grid, ft is common to 
assume that the scene x is sparse due to a limited number of targets. Under a natural 
condition on the mesh size of the grid, we show that an s-sparse scene can be recovered 
via ^-minimization with high probability if n 2 > Cs\og 2 (N). The reconstruction is stable 
under noise and under passing from sparse to approximately sparse vectors. Our theoretical 
findings are confirmed by numerical simulations. 

AMS Subject Classification: 65K05, 65C99, 65F22, 94A99, 90C25 
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1 Introduction 

Our aim is to detect the locations and reflectivities of remote targets (point scatterers) by send- 
ing probing signals from an antenna array and recording the reflected signals. This type of 
inverse scattering — which has applications in radar, sonar, medical imaging, and microscopy 
- is a rather challenging numerical problem. Typically the solution is not unique and insta- 
bilities in the presence of noise are a common issue. Standard techniques, such as matched 
field processing [HU] or time reversal methods [Tj fTHl [TTJ] work well for the detection of very few, 
well separated targets. However, when the number of targets increases and/or some targets 
are adjacent to each other, these methods run into severe problems. Moreover, these methods 
have major difficulties when the dynamic range between the reflectivities of the targets is large. 

In [T3] a compressive sensing based approach to the inverse scattering problem was proposed 
to overcome the ill-posedness of the problem by utilizing the sparsity of the target scene. Here, 
sparsity is meant in the sense that the targets typically occupy only a small fraction of the 
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Fig. 1 (a) Scene with 100 targets in 6400 resolution cells (b) Reconstruction from 900 noisy 
measurements with SNR of 20dB 

overall region of interest. As common in compressive sensing |13 } |4"1 [T5 l l26j. randomness is used 
and in this setup it is realized by placing the antennas at random locations on a square. It 
was proved in |14] that under certain conditions it is possible to exactly recover the locations 
and reflectivities of the targets from noise-free measurements by solving an ^-regularized 
optimization problem, also known as basis pursuit in the compressive sensing literature. 

While the framework in p3] can lead to significant improvements over traditional methods, 
it also has several limitations. For instance, the main theoretical result in that article requires 
the targets to be randomly spaced, a condition that is quite restrictive and does not match well 
with practical scenarios. Also the conditions on the number of targets that can be recovered 
are far from optimal. In this paper we will overcome most of these limitations, thus leading 
to a theoretical framework that is better adapted to practical applications. In particular, we 
also show that recovery is stable with respect to measurement noise and under passing from 
sparse to approximately sparse scenes. Figure [T] depicts the reconstruction of a sparse scene of 
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100 targets in 6400 resolution cells with reflectivities in the dynamic range from 1 to 8 from 
900 noisy measurements, that is with 30 antennas. Both the detection performance and the 
approximation of the true values of the reflectivities are very good. 

What makes the inverse scattering problem with antenna arrays challenging from a com- 
pressive sensing viewpoint is that the associated sensing matrix is not a random matrix with 
independent rows or columns, but the matrix entries are random variables which are cou- 
pled across rows and columns. This in turn means that standard proof techniques from the 
compressive sensing literature cannot be applied readily and results developed for structured 
sensing matrices [26j are of limited use in our case. In fact, it is an open problem whether the 
by now classical and often used restricted isometry property holds for the random scattering 
matrix arising in our context. Instead we provide high probability recovery bounds for a fixed 
vector and a random choice of the scattering matrix (also referred to as nonuniform recovery 
guarantees). We believe that some of the tools that we develop in this paper will potentially 
be useful in other compressive sensing scenarios, where the sensing matrix has coupled rows 
and columns. 

Our paper is organized as follows. In Section [2] we describe the setup of the imaging 
problem and state our main results. As preparation for proving our main theorems, we derive 
a general sparse recovery result in Section [3] and condition number estimates for certain random 
matrices in Section |4j In Section [5] we prove the recovery of sparse vectors for sensing matrices 
with dependent rows and columns which are associated with a class of bounded orthonormal 
systems. This type of matrices includes the sensing matrix arising in the inverse scattering 
problem as a special case. On the other hand this result assumes that the non-zero coefficients 
of the signal to be recovered have random phases. In Section [6] we remove the assumption of 
random phases and show sparse recovery for the inverse scattering setup for signals with fixed 
deterministic phases. In Section[7]we illustrate our theoretical results by numerical simulations. 
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2 Problem formulation and main results 
2.1 Array imaging setup and problem formulation 

Suppose an array of n transducers is located in the square [0,B] 2 , where B > is the array 
aperture. The spatial part of a wave of wavelength A > emitted from some point source 
b £ [0,B] 2 and recorded at another point r € M 3 is given by the Green's function G of the 
Helmholtz equation, 

G(r ' 6) - 47r[|r-6|| 2 • (2ll) 
Here and in the following || • || p refers to the usual £ p -norm. 
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resolution grid 



antenna array 



Fig. 2 The targets at distance zq distributed sparsely in the target domain 



Assume that we want to image the locations of targets which are at distance zq > 0. For 
the analysis, we make the idealizing assumption that the targets are on a discretized grid of 
meshsize do > in the target domain TD := [-L, L] 2 x {zq}, where L > determines the size 
of the target domain. To be more precise, let us assume that each target occupies one of the 
points ( r j) je [iv] TD, where [N] := {1, . . . , N} with N = [2L/do\ 2 and each rj is of the form 
rj = (— L + kdo, —L + Ido, zq) T for some (k, £) G [\/^V] 2 - See also Figure [i] for a visualization 
of this setup. 

In order to be able to analyze the arising sensing mechanism, we approximate the Green's 



function from (2.1) in an adequate way. To this end, we assume to be in the far field region, 
that is, the distance zq from antenna to target satisfies zq 3> B + L. Writing r = (x, y, zq) T 



and b = (£,r/,0) T , 
by 



the truncated Taylor expansion for ||r 

x,y) 



b\\ 2 around tq := (£, ry, zq) is given 



b\\, 



ZQ + 



Under the far field assumption we obtain then that 



2z 



G(r, b) ~ G[r, b) := exp ( — - — I — 



(2.2) 



\ X J Attzq 
If we choose the meshsize do such that the crucial aperture condition [1 

d B 



Xz 



€ N 



(2.3) 



(2.4) 



is fulfilled, then the normalized system of functions 

G(b, n ) := AirzoG (6, r t ) , b e [0,B],£ e [N], 
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satisfies the convenient ortho-normality relation 



' f G(b,r m )G(b,r e )db lXi>U " i! 

'[0,B] 2 



\\(x m ,y m )\\l - \\(xi,y £ )\\l)^ 



B 2 



x f f exp (_2= (lra _^) 6xp (_^ (!/m _ K ),)^ 

J[Q,B] J[0,B] \ Az J \ Az J 

= hm- ( 2 - 5 ) 



It is for this relation to hold that we make the approximation (2.3). 

Let us now describe the scattering matrix. Assume we have a vector {xj)j e \m E C N 
of reflectivities on the resolution grid. We sample n antenna positions bi,...,b n E [0,-B] 2 
independently at random according to the uniform distribution on [0,-B] 2 . If antenna element 
bj E [0, B] 2 transmits and b^ E [0, B] 2 receives, then we model the echo yjk as 

N 

y 3k = G(h,n)G(n, b k )x e , (j, k) e [n] 2 . (2.6) 
£=1 

This is called the Born approximation |2j. It amounts to discarding multipath scattering 
effects. So if the transmit-receive mode is that one antenna element transmits at a time and 
the whole aperture receives the echo, the appropriately scaled sensing matrix A EC™ xN is 
given entrywise by 



A m/ := G(&„r £ )G(r £ A), (j, k) E [n]\i E [N]. (2.7) 



Then y = Ax by (2.6). Due to the randomness in the k E [n], the matrix A is a (structured) 
random matrix with coupled rows and columns. 

In many scenarios the number of targets is small compared to the grid size. This naturally 
leads to sparsity in the vector x E of reflectivities, ||x||o := #{l : xe 7^ 0} < s, where s ^ N. 
Compressive sensing suggests that in such a scenario, we can recover x from undersampled 
measurements y = Ax E C n when n 2 <^ N. We note that A contains only n(n+ l)/2 different 
rows due to the symmetries in the sensing setup. Our goal is determine a good bound on the 
required minimal number of antennas n in order to ensure recovery of an s-sparse scene. A 
small number of antennas has clear advantages such as low costs of imaging hardware. 

2.2 Compressive sensing 

We briefly describe the basics of compressive sensing in order to place our results outlined below 
into context. Given measurements y = Ax E C m of a sparse vector x E C N , where A E C mxN 
is the so-called measurement matrix, we would like to reconstruct x in the underdetermined 
case that m N by taking into consideration the sparsity. 
The naive approach of £o _mm imization 

min \\z\\o subject to Az = y (2-8) 

zeC N 

is NP-hard [21]. Hence several tractable alternatives were proposed including ^i-minimization, 
also called basis pursuit [TOj, [131 H! > 

min ||z||i subject to Az = y. (2-9) 

z&C N 
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This can be seen as a convex relaxation of (2.8) and can be solved via efficient convex opti- 



mization methods [3 [9]. It is by now well-understood that ^-minimization can recover sparse 
vectors under appropriate conditions. Remarkably, random matrices provide (near-) optimal 
measurement matrices in this context and good deterministic constructions are lacking to date, 
see |26| 115] for a discussion. For instance, anmxiV Gaussian random matrix A ensures exact 
(and stable) recovery of all s-sparse vectors x from y = Ax using ^i-minimization (and other 
types of algorithms) with high probability provided 

m > Cslog(N/s), (2.10) 

where C > is a universal constants. This bound is optimal [13^ I16j . It is crucial that m 
is allowed to scale linearly in s. The log-factor cannot be removed. Recovery is stable under 
passing to approximately sparse vectors and under adding noise to the measurements. In the 
latter case, one may rather work with the noise-constrained ^i-minimization problem 

min \\z\\i subject to \\Az — y\\2 < r). (2-11) 

zeC N 

Random partial Fourier matrices [H [29l [25j [26] (that is, random row-submatrices of the 
discrete Fourier matrix) and other types of structured random matrices [26, 27J also provide 



s-sparse recovery under similar conditions as in (2.10) (with additional log- factors) . 

Some of the mentioned recovery results are derived using the restricted isometry property 
(RIP) [HE]. This leads to uniform guarantees in the sense that once the matrix is selected, 
then with high probability every s-sparse vector can be recovered from y = Ax. The RIP, 
however, is a rather strong condition which is sometimes hard to verify. In particular, it is 



open to verify it for our random matrix in (2.7). Instead, we may work with weaker conditions, 



which ensure nonuniform recovery in the sense that a fixed s-sparse vector is recovered with 
high probability using a random draw of the matrix. Our result below for the structured 



random matrix in (2.7) is based on the extension of certain general recovery conditions for 
^i-minimization [T71 [32J, [5] to stable recovery using a so-called dual certificate, see Section |3j 

2.3 Main results 

We define the error of best s-term approximation in the ^i-norm by 

a s (x)\ := inf \\x — z\\i. 

INIo<s 

Furthermore, we will assume throughout that the aperture condition 

d B 



Xz 



€ N (2.12) 



holds, which can be accomplished b y an appropriate choice of the meshsize d^. The further 

We will refer to the matrix A £ c^ 2 xN in 



2.1 



notation is the one used in Section 
with the antenna positions b\ , . . . , b n selected independently and uniformly at random from 
[0, B] 2 as the random scattering matrix. Note that the aperture condition (2.12) implies that 



E^4*^4 = n Id by a similar computation as in (2.5), that is, in expectation the matrix A* A 
behaves nicely, which will be crucial in the proof. Let us now state our nonuniform recovery 
result. 
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Theorem 2.1. Let x G 



and A G 



■vn 2 xN 



be a draw of the random scattering matrix. Let 



s G N be some sparsity level. Suppose we are given noisy measurements y = Ax + e G C n with 
[|e|| 2 < nn. If, for e > 0, 



n 2 > Cslog 2 



t) 



(2.13) 



with universal constants C, c > 0, then with probability at least 1 — e, the solution x G C to 
the noise- constrained l\-minimization problem 

subject to \\Az — y\\2 < r]n. 



mm \\z i 



satisfies 



x 



The constants satisfy C < (800e 3 / 4 )' 
C 2 < 4(1 + y/6) « 13.798. 



< C\\fsr) + C 2 o- s (x)i. 

: 2.87 • 10 6 ; c < 6, Ci < 4(1 + \/2) + 8^/3 



(2.14) 

(2.15) 
23.513, 



Remark 2.2. (a) The constants appearing in Theorem 2.1 are quite large and reflect a worst 
case analysis. No attempt has been made to optimize the above bounds. In practice, much 
better bounds can be expected, see also the numerical results below. 

(b) The scaling of the noise level, ||e|| 2 < nn is natural because e G C" . Indeed, if we have 
a componentwise bound \ej\ = \ {Ax)j — yj\ < rj for all j G [n] 2 then it is satisfied. 



(c) The error bound (2.15) is slightly worse than the one we would get under the RIP. In 
fact, if A has the RIP then the associated error bound improves the right hand side of 
(2.15) by a factor of s~ 1 ^ 2 [6]. Unfortunately, it is so far unknown whether the random 
scattering matrix A obeys the RIP under a similar condition as (2.13), so that the error 
bound (2.15) is the best one can presently achieve. 



(d) If x is s-sparse, a s {x)\ = 0, and if there is no noise, r/ = 0, then (2.15) implies exact 
reconstruction, x = x, by equality- constrained l\-minimization (2.9). 

(e) We can specialize the error bound in the previous theorem for the case of Gaussian noise. 
To this end, assume that the components of e G C n are i.i.d. complex Gaussians with 
variance rj 2 , where the real and imaginary part of a complex Gaussian are independent 
real Gaussians with variance n 2 /2. A standard calculation shows that the noise satisfies 
||e|| 2 < rjnlog(l/e) with probability at least 1 — e. Assuming that e is independent of the 
matrix A, it follows that the solution x of noise- constrained i\-minimization with bound 
\\Az — y\\ 2 < rjnlog(l/e) satisfies 



< C 1 n^fs\og{l/e) + C 2 <J s {x) i 



(2.16) 



with probability at least (1 



The constants C\, C 2 satisfy the bounds of Theorem 



Theorem 



2.1 



holds for a fixed, deterministic x G C N . We define the sign of a number a G 

Jl| ifo^O, 
sgn(a) = < |a| 

ifa = 0. 
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For a vector x G we denote by sgn(x) := (sgn(xj)) j . g j Ar j the sign pattern of x. On the way 



to the proof of Theorem 2.1, we will provide the easier result stated next for the case when 
the sign pattern of x restricted to its support set T C [N], sgn(x)r = (sgn(xj)) j . gT ,, forms a 
Rademacher or a Steinhaus sequence. The latter amounts to assuming that the phases of the 
reflectivities are iid uniformly distributed on [0,27r], which is a common assumption in array 
imaging and radar signal processing. Theorem |2.3| below actually establishes sparse recovery 
in a more general setting than the inverse scattering problem. It is not only applicable to 
the radar-type sensing matrices analyzed above, but to more general sensing matrices whose 
rows and columns are not independent, and whose entries are associated with a certain class of 
orthonormal systems. Its statement requires the notion of bounded orthonormal systems 



Definition 2.1. Let D C M. d be a measurable set and u a probability measure on D. A system 
of functions {<$>k '■ D — > Cj^^ is called a bounded orthonormal system (BOS) with respect 
to(D,u)if 



$ k (t)$ t (t)v(dt) = 5 ki 
and if the functions are uniformly bounded by a constant K > 1, 

max ||$fc||oo < K. 

Let now {^}^ g [jv] be a BOS on (D, v) with bounding constant K = 1 and with the property 
that {^|}^ g j7v] ^ s a ^ so a BOS on (D, v). Note that due to the orthogonality relation, we then 
necessarily have = 1 for all t G D. The functions $>e(t) = G(r£,t), t G [0,-B] 2 fall into 



this setup when the aperture condition (2.12) is satisfied, see also (2.5). Another example is 
provided by the Fourier system {&e} iG %, where ®i(t) = e 2mU , £ £ Z, t € [0,1]. For b\, hi € D, 
set 

v(h,b 2 ) := $/(&2)) £C W 

V J £e[N] 

Sample now n elements b\, . . . , b n independently at random according to v from D. Define the 
sampling matrix A via 

A:={v{b v b k T) 3M[n] ^^ 2xN , (2-17) 
so that A is the matrix with rows v(bj,bk)*, (j, k) £ [n] 2 . Note that with the system <&p{b) = 



G(re,b) we recover the random scattering matrix (2.7) in this way. 

Now we can state our main result for random sign patterns. We recall that the entries of 
a (random) Rademacher vector e are independent random variables that take the values ±1 
with equal probability. Similarly, a Steinhaus vector is a random vector where all entries are 
independent and uniformly distributed on the complex torus {z €= C : |z| = 1}. 



Theorem 2.3. Let A € C" 2xiV be a draw of the random sampling matrix from (2.11). Let 
x G and T C [N] be the index set corresponding to its s largest absolute entries. Assume 
that the sign vector sgn(x)x of x restricted to T forms a Rademacher or a Steinhaus sequence. 
Suppose we take noisy measurements y = Ax + e G C n with ||e|| 2 < fjn. If 

n 2 > Cs log r ° l(iV £ ~ S) \ log 2 {c 2 (N - s) 2 s/e) , (2.18) 
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then with probability at least 1 — e, the solution x G C to noise-constrained t\-minimization 



(2.14) satisfies 

\\x-x\\ 2 < Ciy/srj + C 2 cr s (x)i. (2.19) 

The constants satisfy C < 1024, c x < 8, c 2 < 576, C\ < 4(1 + y/2) + 8\/3 « 23.513, C 2 < 
4(1 + y/E) « 13.798. 



Remark 2.4. Whereas the bounds on the constants in Theorem \2.1\ are quite large, and cer- 
tainly improvable, in the case of random sign patterns, the number of antennas required must 
satisfy 

n > 32Vslog 3/2 {cN/e) , 



which is a reasonable bound, see also the improvement in Remark 4-6 (b). 

3 Stable sparse recovery via ^-minimization 

In this section we establish a general result for the recovery of an individual vector x G 
from noisy measurements y = Ax + e G C m with A G C mxN . It uses a dual vector in the spirit 
of [T71 E2] and extends these results to the noisy and non-sparse case. The proof is inspired by 
[5] for recovery based on the weak RIP. However, since we actually do not assume the weak 



RIP, the error bound in (3.5) below is slightly worse by a factor of \fs than the one in [5j 
Section 4]. In the noiseless and exact sparse case the theorem below implies exact recovery 
similar to [I?] 152]. 

For a set T C [N] and a matrix A G c mxN with columns a,j G C m , j G [N], we denote 
by At = (aj)j^x £ C mx ' T ' the column-submatrix of A with columns indexed by T and by 
T c := [N] \ T the complement of T in [N]. Similarly, we denote by xt G c' t ' the vector 
x G restricted to its entries in T. The operator norm of a matrix B on l 2 is denoted by 

[|-B[|2->2- 

Theorem 3.1. Let x G and A G C mxN with i 2 -normalized columns, ||«j|| 2 = 1, j G [N]. 
For s > 1, let T C [N] be the set of indices of the s largest absolute entries of x. Assume that 
At is well- conditioned in the sense that 



|^A T -Id|| 2 ^ 2 <^ CU) 



and that there exists a dual certificate u = A*v G C with v G C m such that 

u T = sgn(x) T , (3.2) 

1 . , 

\\ut4oo ^ 2' ^ 3 " 3 ) 

Hl 2 < V2s. (3.4) 

Suppose we are given noisy measurements y = Ax + e G C m with ||e|L < v. Then the solution 
xeC N to noise- constrained l\-minimization (|2.11l) satisfies 



\\x — x\\ 2 < C\\fsri + C 2 o- s (x)\. (3-5) 
T/ie constants satisfy C x < 4(1 + \/2) + 8\/3 « 23.513, C 2 < 4(1 + v 7 ^) ~ 13.798. 
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Remark 3.2. The constants appearing in the conditions above are rather arbitrary and chosen 
for convenience. 



Proof. Write x = x + h. Due to (2.11) and the assumption on the noise level, ||e||2 < f], we 
have 



\Ah\ 



| Ax — y — (Ax 



< \\Ax — 2/ 1 1 2 + \\Ax — < 2n. 



(3.6) 



Since x is feasible for the optimization program (2.11) we obtain 

IMIi > Plli = || (s + 7i)t|Ii + + /i)t c |Ii 

> Re(((x + h) T ,sgn(x)T)) + H^HIi - lkT c Hi 
= \\xWi + Re((/ir,sgn(x)T)) + H^HIi - 2 ||zt<=||i , 

where we applied Holder's and the triangle inequality in the second line. Rearranging the 
above yields 

II^HIi < | Re ({hr, sgn(x)r))| + 2 ||xt c ||i • (3.7) 
Let u = A*v be the dual certificate. Then, using the Cauchy-Schwarz and Holder's inequality 

|Re((/i T ,sgn(x) T »| = \Re ((h T , (A*v) T ))\ < \(h, A*v)\ + \(h T c : u^}\ 



< \\Ah\\ 2 \\v\\ 2 + ||/it c |Ii ll u T c |loo < 2v2sn + - ||/lT c |ll 



where we used (3.3) and (3.4) in the last line. Plugging into (3.7) yields 

||^T c |li < 4:V2ST] + 4 ||iCT c ||i • 



Due to (3.1), we have 
1 



\\h<r\\l < II^tMI2 = (A T h T ,Ah) - (A T h T ,A T ch T c 



(3i 



(3.9) 



Using Holder's inequality, the normalization of the columns of A and (3.6), we obtain 



\(A T h T ,Ah)\ < ||Hli II^T^IL <2^sv \\h T \\ 2 . 



The triangle inequality and the Cauchy Schwarz inequality give, by noting that (3.1) implies 

^3 



\A 



< 



2 - 



\(A T h T ,A T ch T c)\ < ^ \hj\\(A T h T , aj )\ < Yl \ h i\\\ A Thrb\\aj\\2 < J ^ HMhllMli- 



Inserting into (3.9) we obtain 



||M 2 <Ww + V6\\hr4i 



(3.10) 



Combining (3.8) and (3.10) we arrive at 



| 2 < IIM2 + IIMI1 
< (4(l + v^) + 8>/3)>/s»7 + 4(l + >/6)||xrc||i. 

Due to the choice of T we have ||£t c |Ii = o~s( x )l- This completes the proof. 



□ 



10 



4 Conditioning of submatrices 



Theorem 3.1 requires to find a dual certificate u = A*v with ut = sgn(x)r, where A is the 



random scattering matrix introduced in Section 2.1 and T C [N] is some support set. Condition 

rate the con 



(3.1) in Theorem 3.1 suggests to investigate the conditioning of At- Recall that 

v(bj,b k ) 



ee[N] 



£ 



where {<&e} is a bounded orthonormal system with constant K = 1 such that is also 

a bounded orthonormal system. The rows of the random scattering matrix A £ £n 2 xN are 
the vectors v(bj,bk)* £ C lxAr , (j, fc) € [n] 2 , where the b±,. ..,b n are selected independently at 



random according to the orthonormalization measure v, see (2.17) and Definition 2.1. The 



scattering matrix ^4 in (2.7) is a special case of this setup. 

We aim at a probabilistic estimate of the largest and smallest singular value of \At £ 
i.e., the operator norm 



1 



71- 



A* T A T - Id 



2^2 



1 n 

^2 v (. b j, b k)Tv(bj,b k ) T -Id 



(4.1) 



2^2 



The central result of this section stated next provides an estimate of the tail of this quantity. 

Theorem 4.1. Let A £ £ n2xN b e the random matrix described above and let T C [N] be a 
(fixed) subset of cardinality \T\ = s. If, for 5, e > 0, 



n 



2 > 1024(T 2 slog 2 



576s ; 



then 



1 



A* T A T - Id 



2-s>2 



> S < s. 



(4.2) 



(4.3) 



The proof will be given after some auxiliary results are presented. 
4.1 Auxiliary results 

The fact that the rows of A are not independent makes the analysis difficult at first sight. 
In order to increase the amount of independence, we will use a version of the tail decoupling 
inequality in Theorem 3.4.1 of [12] . For convenience, we provide a short proof, which essentially 
repeats the one in [11] in our slightly more general setup. In this way, we also obtain better 
constants than by tracing the ones in the proof of \12\ Theorem 3.4.1]. 



Theorem 4.2. Let (Xi 



i€\n\> 



n > 2, be independent random variables with values in a mea- 



surable space fL Let h : $7 x Q — > B be a measurable map with values in a separable Banach 
space B with norm ||-||. Then there exists a subset S C [n] such that 



Y^HXuXj 



> t 


< 36P 


4 









E h (Xi,X, 

ies,jes c 



>t V 



36P I ! 

where for a, b £ R we denote a V b := max {a, b} . 



E h (Xi,X 3 

ies c ,jes 



>t\, 



(4.4) 
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The proof of Theorem 4.2 employs Corollary 3.3.8 from |12j . 



Lemma 4.3. Let (B, || • ||) be a separable Banach space and let Y be a B-valued random vector 
such that for each £ £ B* , the dual space of B, the map £(Y") is measurable, centered and 
square integrable. Then, for every x S B, 



( 



\x + Y\\ > \\x\\) > - inf 
4£eB* 



E 



ieooi s 



1/2 



(4.5) 



Proof of Theorem 4-2 Set V := (Xj) ig j n j and let e = (ei, . . . , e n ) be a Rademacher sequence 
independent of V. We introduce 



Z := H*» X i) ~ ^M X ^ X i 



(4.6) 



and Y := — Yli^j e i e jh( x i> x j)- Observe that 



K[Z\D] = ^h{X i ,X j ). 

Let £ be an element of the dual space B* . Conditional on T>, £(Y) is a homogeneous scalar- 
valued Rademacher chaos of order 2. By Holder's inequality, we have for an arbitrary random 
variable V with finite fourth moment that 



E 



\V\ 2 <(E[\V\ 



< (E\\V\ 



,1/2 



,1/2 



and therefore 



E 



W\ 



E 



1/2 



E 



E 



< 



\v\\ 
\v\ 2 



1/2 
1/4 



E 



\V\ 



1/4 



E[\V\] 



E 



IVI 



1/2- 



(4.7) 



Lemma 2.1 from [TT] states that 



E 



V 



1/2 



< 3E 



\Z0O\' 



V 



Plugging this result into (4.7) gives 



nm\\v\ 



E 



E 



\i{Y)\ 2 \v 



1/2 



> 



|£(F)| 2 |P 



> 



E 



\ttY)\ 4 \V 



1/2 - 3 ' 



Taking into account (4.6), an application of Lemma 4.3 yields 



\Z\\ > 



V > 



1 \ 



4 V 3 



1 

36' 



(4.8) 
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Multiplying both sides of (4.8) by the characteristic function x of the event 
YlijLj h{Xi,Xj) > ij and taking the expectation with respect to V gives 



>t\< 36P(||Z|| >t) = 36E £ [E [X{|[z[|>t}|e]] • 



(4.9) 



We conclude by noting that there is a vector e* G {±1}" such that 

E[x{|[z|[>t}|e*] >E t [E [x { ||z||>t}l(ei, •••,£«)]] • 

The claim now follows by setting S := {i G {1, . . . , n}|e* = 1}. □ 
We will moreover need the following complex version of Hoeffding's inequality from |22j, 
equation (9). 

Theorem 4.4. Let £i, . . . , £ n 6e complex, independent and centered random variables satisfying 
1 6c I < «fc /or constants a\, . . . , a n > 0. Set a 2 := Y*k=i a \- Then 



k=l 



>*l ^ 4eX P(-^) 



(4.10) 



The final tool to prove that submatrices of ^4 are well-conditioned is the noncommutative 
Bernstein inequality from |31j . 

Theorem 4.5. Let X±, . . . , X n £ C sxs be a sequence of independent, mean zero and self-adjoint 
random matrices. Assume that, for some K > 0, 



and set 



±EXj 



a 2 := 



£=1 



2^2 



Then, for t > 0, i£ /io/ds i/iaf 

n 



> t < 2sexp 



2^2 



i 2 /2 \ 



a 2 + Kt/Z 



(4.11) 
(4.12) 

(4.13) 



4.2 Proof of Theorem 14.11 

Denote by 

Dj :=diag :£eT) G 



the diagonal matrix with diagonal consisting of the vector (<J>^(&j) I G C s and introduce 



g(bk) := ( <5>e{h) ) G C s . Since -Dj-D* = Id we observe that 



n- 



-A* T A T - Id 



1 n ^ n / n \ 

^ £ KM*)WM*)t -M] = ^i E - Id] D*. 

j,fc=i j=i \fe=i / 



(4.14) 
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Let b' := (6^, . . . , b' n ) denote an independent copy of b := (61, . . . , b n ). By the triangle inequal- 
ity, we have 



n- 



-A* T A T - Id 



>S\< 



2^2 



1 



11- 



} J [v{bj, b k )Tv(bj,b k )T ~ W] 



+ : 



1 

n 2 



> 



2^2 



E M b J' b j)T v i b ji b j)*T ~ Id l 



2^2 



Using the decoupling inequality of Theorem 4.2 with S* C [n] denoting the corresponding set, 
and the symmetry relation v(bj, bf.) = v(bk, bj), we obtain for the first term above 



n- 



E [v(bj, bk)Tv(bj,b k )T - Id] 



> 



2^2 



<36P 



2 [«(6i,65fe)r«(&i.^)T-Id] 
jes,kes c 



~ 8 



(4.15) 



2^2 



We will now estimate the right hand side of (4.15). Introducing 



x' ■= E b(*4M«4)* - id] e 

fceS c 



we observe that (|4.14|) together with Fubini's theorem yields 

6 



36P 



=36P 



1 



ii- 



ii- 



jes,fces c 



> 



2^2 



ie5 



> 



Eb' 



2^2 



36P b 



E^ x,jD ; 



> 



2-s>2 



(4.16) 



As the next step we apply the noncommutative Bernstein inequality, Theorem 4.5, to the inner 
probability in (4.16). Since Dj is a unitary matrix and the functions <&i are orthonormal we 
have 



I D i X ' D j 1 1 2^2 ~ 1 1 X ' 1 1 2^2 



(4.17) 



E [(Z^-X'D*) 2 ] = diag(X /2 ), 

where diag (AT' 2 ) denotes the matrix that coincides with X' 2 on the diagonal and is zero 
otherwise. Set fi to be the coherence parameter 



H := max 

t,ieT:e<i 



E *tfk)*&k 



A crucial observation is that diag (A /2 ) ■< (s — 1) diag (/x 2 ,/x 2 , . . . ,/x 2 ), where ^ denotes the 
semidefinite ordering. Therefore, it holds that 



^E[(DjX>D* 



*\21 



< \S\ (s-l)fi 2 < n(s-l)^ 2 



(4.18) 



2->2 
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Plugging the bounds (4.17) and (4.18) into (4.13) yields 



> o ) < 2sexp 
8 



2^2 



128(8-1) 2 , 165 || y,u 



(4.19) 



Set e = e/36. Multiplying the inner probability in (4.16) by the characteristic function of the 
event E := E\ D -E2, where 

:= f l28(.-l) , 



£2 := 



16 



Ix'lL „< 



21og(8s/e) J ' 
5 



3n 2 II ll2->2 - 21og(8s/e) J ' 
we obtain, with and E^ denoting the complements of E\ and E2, 



361 



ie5 



> I ) < | + 3G(2.s-?(£" l ) + 2.s-?(^)) 



(4.20) 



2^2 



Therefore, it remains to estimate the probabilities of the events Ef and E%- For the event Ef, 
the union bound over all s(s — l)/2 < s 2 /2 two element subsets of T implies in the case of a 
general BOS that 



m{2sV(E{)) < 72sP [ |J 

a,hTi<£ 



128(s - 1) 



E 



fceS c 



> 



21og(8s/e) 



< 



72s E P ( 

e,ieT,£<£ \ 



< 144s' 3 exp 



E 

fceS c 

n 2 £ 2 \ 
1024s log (8s /e) J ' 



> 



«5n 3 / 2 



y^56slbg (8s/e) 



(4.21) 



(4.22) 



where we have applied Hoeffding's inequality in the form of Theorem 4.4 in the last line. The 



right hand side of (4.22) is less than e/4 provided 

n 2 > 1024<T 2 slog 2 (576s 3 /e) • 



(4.23) 



As for E%, we are going to apply the noncommutative Bernstein inequality again. Noting that 



\g(b' k )g(b' k r-ld\ 



E E [{9{b' k )g{b k )* - Id)' 

kes c 



2-s>2 



2^2 



we obtain 



36 (2.s?(/-;0) < 1 l l.s 2 exp 



8-1, 



\S c \(s-l) <n(s-l) 



^ 2 



) i 4 J log 2 (8s/e) + f<5^1og(8s/£) 



(4.24) 
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Assuming (4.23), the right hand side of (4.24 



with respect to (D,u), Condition (4.23) imp 



) is less than e/4. Since { 3>| } feg ^ is also a BOS 
ies, after another application of the noncommu- 



tative Bernstein inequality analogously to (|4.24|) and the preceding steps, that 

6 



1 



n- 



v{bj,bj) T v(bj,bj)* T - Id] 



> 



< 



2->2 



This concludes the proof. 



(4.25) 



□ 



Remark 4.6. (a) In order to show (4-25), we used the assumption that {^1}k£[N] ^ a ^ so a 
BOS with respect to (D,is). It might be that (4-25) also holds under weaker assumptions 
on the BOS, however, it does not hold if we choose for example the Hadamard system. 



(b) Assuming the special case of the scattering matrix (2.1), the terms in (4-21) take the 
form 



$e(b k )$i(bk) = exp 



7T7 

Xz 



exp 



2vri 
Xz 



{(rp-r e ),b k ) 



where due to the aperture condition (2.4) 



6 k ■= exp 



2m 
Xz 



((r|- ri),b k ) 



is a Steinhaus random variable and 9 := (6\, ■ ■ ■ ,6 n ) is a Steinhaus sequence. We can 
therefore apply Hoeff ding's inequality for Steinhaus sequences, see f2b^ . Corollary 6.13. 
This inequality states that, for arbitrary v € C n and k G (0, 1), 



(vj) 



>t)< 



1 



exp 



(4.26) 



Applying this result with k = 4/5 instead of Theorem 4-4 * n (4-21), one obtains that the 
claimed spectral norm estimate (4-3) holds under the slightly improved condition 

„^>320rtlog(?f£)log(^), (4.27) 



where we have also taken into consideration the precise form of (4-22). 



5 Nonuniform Recovery of Scatterers with Random Phase 



Proof of Theorem \2.S] The key idea of the proof is to apply Theorem 3.1 Note first that ( 2.14 ) 
is equivalent to 

1.1 

argmmUzl^ subject to 



ze<i 



-Az 



n 



-y 



n 



< rj. 



(5.1) 



Let T C [N] be the index set corresponding to the s largest absolute entries of x and assume 
that sgn(x)T is either a Rademacher or a Steinhaus sequence. Suppose we are on the event 



E :-- 



1 



A^At - Id 



< 



1 



2^2 



(5.2) 
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Theorem gjj] states that P [E c ] < e/2 if 

n 2 > 4096s log 2 (1152s 3 /e) . (5.3) 
Set A := ±A The event E means in particular that At fulfills condition (|3.1l). We define the 



vector v G C™ in Theorem 3.1 



via 



i f ) sgn(x) T = A T (a t A t ) sgn(x) T 



(5.4) 



where AJ_ de notes the pseudo-inverse of At- Setting u := A*v, we have ut = A T v = sgn(x)r, 
so that (3.2) is satisfied. Since we are on the event E, the smallest singular value of At satisfies 



Cmm^r) > l/v2 and therefore 

1Mb < ll^lla-dl sgn(x) T || 2 < (JminiAT)' 1 ^ < V2s. 



Hence, also \2>A\ is satisfied. It remains to check (|3.3|). To this end, note that 

(A T ae,sgn x T ) 



\ut c \ 



max 



[a t A t ^ A T ai,sgn(x)T 



max 



As in the previous section, we denote b = (bi, . . . ,b n ). Since sgn(x)T =: {0f) l&T =: is 
a Rademacher or a Steinhaus sequence, condition (5.3), Fubini's Theorem and Hoeffding's 
inequality for Rademacher resp. Steinhaus sequences together with the union bound give 



f max (ALa^, sgn( 



XT 



XE^Vei (A T a e , sgn(x) T ) 
/ 

xe ^2 2 exp 



> 



max 



+ 



(A T a e , sgn(x) T ) 



V 



AJpCLi 



+ 



<2(iV-s)E b 



/ 



X£ cxp 



l 8 max.£ £ T c 



T CL£ 



+ 



(5.5) 



Since we are on the event E from (5.2 ), it follows as before that 
2 and therefore 



A* T A T 



< 



2^2 



< 



max 



A T an 



< 4 max 

2 l£T c 



AtCL£ 



< 4s max I (a,£, 5/) I . 
2 eeT c ,hr 



Set 



Since 



/x := max 



k=l 



£$/(& fc )^( 6 *) 



fe=l 
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we have 



We then obtain 



2(N - s)E b 



max 



XE exp 



n' 



1 



< 2(N - s)E h 



<- + 2{N 




^ 8 max( £ r< 
1 

~324/u 4 

11 > 



Aj,CL£ 



(32log(8(N-s)/e)) 1 /\ 
Applying the union bound and Hoeffding's inequality as in ( |4.22 ) gives 



2(N - s)P h 



,1/4 



n 



< 8(N -s) 2 sexp 



(321og(8(iV- S )/6)) 1 / 4 



W^s log (8(N -s)/e) 



The condition 



> 32^1og 1/2 (8(N - s)/e) log (576(A^ - sfs/e) 



(5.6) 



(5.7) 



implies that the right hand side of equation (5.6) is less than e/4. Assuming s < N/3 and 
8(N — s)/e > e 4 , (5.7) implies (5.3) and therefore also P (E c ) < e/2, where E is the event 

are 



from (5.2). We have thus verified that under condition (5.7), all conditions of Theorem 



3.1 



satisfied with probability at least 1 — e. Since we work with the rescaled version (5.1) of A 



the solution x satisfies (2.19) with the required probability. This finishes the proof of Theorem 
l2~3l □ 



Remark 5.1. In the special case of the scattering matrix (2.1), we can apply the same technique 



as in Remark \4.(^ (b) to obtain a slight improvement of (5.1). In fact, assuming also the mild 
condition 8(N — s)/e > e 7 , all conditions of Theorem 3.1 are satisfied with probability at least 
1 — e under the improved condition 



n 



> 5V2s~ log 1/2 (8(N - s) /e) log (576s(A^ - s) 2 /e) 



6 Nonuniform Recovery of Scatterers with Deterministic Phase 
6.1 Set partitions 

To prove the central result of this section, we will require a few facts on certain partitions 
of the set [N], N G N. As in [251 Section 2.2] we define V (N, k) as the set of all partitions 
of [N] into exactly k blocks such that each block contains at least two elements. Note that 
then necessarily k < N/2. The numbers S2(N,k) := \V(N,k)\ are called associated Stirling 
numbers of the second kind. In |25|, Section 3.5] it was shown that 

/ 3N \N-k 

S2(N,k)<( 6 Y ) . (6.1) 
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For our purposes, we will also need partitions of [N] in which not necessarily all blocks contain 
at least two elements. 



Definition 6.1. For N > 1, t < k < N, we define V (N, k,k — t) as the set of all partitions 
of [N] into k blocks such that k — t of these blocks contain at least two elements. Moreover, we 
define Vex (N, k,k — t) as the set of all partitions of [N] into k blocks such that exactly k — t 
blocks contain at least two elements and exactly t blocks contain exactly one element. 

The above definition of V (N, k,k — t) implies that necessarily 2(k—t) < N—t and therefore 

N + t , . 

k<^~- (6.2) 

Our next goal is a convenient estimate of the numbers S (N, k,k — t) := \P (N, k,k — t)\. We 
first observe that 

t 

S(N,k,k-t) = ^2\V ex (N,k,k-r)\. 

r=0 

Moreover, we have 

\V ex XN,k,k-r)\=(^y 2 (N-r,k-r)<(^)(^\ , (6.3) 



where the last inequality follows from the estimate (6.1 ). Since t < N and therefore ^*=o (^) ^ 
2 N , this yields 



S(N,k,k-t)<(W)» ( — ) . (6.4) 



This estimate will become crucial in the next section. 



6.2 Construction of a dual certificate 

We will use combinatorial estimates inspired by the analysis in [Hl27|l25 , 8j in order to construct 
a dual certificate. Hereby, we exploit the estimates on set partitions stated above. In this way, 
we will extend the recovery result of Section [2] to a vector x G with deterministic phase 
pattern sgn(x)r - recall that T is the set of indices corresponding to the s largest absolute 
entries of x. Since the phases are now deterministic we can no longer use the additional 
concentration of measure coming from the independent randomness in the signs. In particular, 
we have to estimate the probability of the event 



max 



1 

>2 



(A J T ae, sgn(x) T ) 

using only the randomness in A. Thr oughout this subsection, we will assume that the sampling 



matrix A G £n 2 xN j s gi ven (|2.7|). However, we note that exactly the same proof applies 



if we take the Fourier system {&k} from [25] instead and construct the random matrix as in 



(2.17). 



Let us state the central result of this section. 
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Theorem 6.1. Let A 6 <C n2ycN be the random sampling matrix from (2.1) and let x 6 C^. Let 
T C [N] with \T\ = s be the index set of the s largest absolute entries of x. Set A := \A. Lf 



n 2 > Cslog 2 (cN/e) 



(6.5) 



then with probability at least 1 — e 



(i) there is a v £ C n such that u := A*v and v satisfy Conditions (3. 2), (3. 3) and (3.4) of 



Theorem 3.1 



(ii) for the matrix A, it holds that 



A* T A T - Id 



< . 

2^2 e 



(6.6) 



The constants satisfy C < (800e 3 / 4 ) 2 , c < 6. 
Proof. Suppose we are on the event 



E :-- 



A* T A T - Id 



< - 
2->2 e 



where the constant 1/e in the probability is chosen to ease computations later on. Theorem 4.1 



Id - [Id-ApAp 



Id + Y^ ( Id -A T A T 



implies that P [E c ] < e/4 if Condition (6.5) holds. Our aim is an estimate for the probability 
of the event 

E := [ A* Tc At U t A t ) _1 sgn(x) r > \ } . (6.7) 
By expanding the Neumann series, we observe that, for m G N, 

m\ -1 

= Id + 

r=l 

With 

we obtain 
'A 



A m ■= f Id —Asp A%> 



r=l 



pA^j 



Id - ( Id —ApAp 

m—l 

E 

k=0 



m\ — 1 



m—l 



Id - (id-ApAp) J ^2(ld-A T Ap 



k=0 



= (Id+A m ) ^2 (id -A T A Tj 

k=0 

An application to sgn(x)r yields 

ApcAp [ApAp^j sgn(x) T = ApcAp ^ (id-ApAp) sgn(x) T 

k=0 

m—l , 

+ ApcAp A m (id-ApAp^J sgn(x) T . 



m—l 



k=0 
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An application of the pigeon hole principle yields 

m— 1 



E < 



A* T cA T [Id -A? A T ) sgn(x) T 

m— 1 , 

A TC A T A m (id -A t At) sgn(x) T 



> 



fe=0 



> 4 



We now choose 



m := \2 log (6iV/e)l . 



For the treatment of the event 



E : = 



m— 1 



fc=0 



A* TC A T A m ^ [Id -A T A T ) sgn(x) T 



> 



4 ' 



(6.8) 
(6.9) 

(6.10) 
(6.11) 



in (6.9) we denote by the columns of the unnormalized sampling matrix A and set 

pL 2 := max \(ag, ■ 



For a matrix 5 G C mxk , we denote by 



l^lloo^oo : = SU P ll^l 
INIoo=l 



max 



ne[fc] 



the operator norm of B on ^oo. We then obtain 

A T cA T 



s 9 

oo— »oo 72 

Moreover, for an arbitrary matrix € C sxs , it follows from the definition of |Hloo->oo that 



B^ . < y/s\\B\\ 2 , 2 . Conditionally on the event E, this inequality gives 



lAnlloo^oo < V~S \\A m \\ 2 ^2 



oo oo , \ r 

<viE||( w -^)L 2 <vsE(^) =^ 

r=l r=l ^ ' 



Similarly, we obtain 



m— 1 



A;=0 



< Vs- 



e - 1 



Combining these estimates, we obtain, conditionally on the event E, 

m—l , 

A T oA T A m ^ (id-I^Ir) sgn(x) T 

fc=0 

m— 1 



< 



Arc At 



k=0 



Y [Id -A* T A T 



< 



s 2 e 



^2 (e - 1) e m - 1^ ~ e"*^ - 9^ ' 
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where we have applied (6.10) and the fact that s < N in the last line. Hence, the probability 
of the event E in (6.11) can be bounded by 



(E) = P (E n E) + P (E n E c ) < 

9n 



( e " 1 



\9n 2 



1\ e 
M > - I + 



< 4s(iV - s) exp 



e e 
8e 2 J + 4 " r 



where we have applied Hoeffding's inequality Theorem 4.4 and the union bound together with 
(6.5) in the last line. It remains to estimate the term in (6.8). To this end, we define, for 
£ e T c , 

( m— 1 , 



Ei 



k=0 



^ a\A T (Id- A T A T j sgn(x) T 



> 



1 



(6.12) 



Let {(3 k } k=0 m _i C (0,1) such that Y^k=o Pk < 1/4 and let G N to be chosen below. 



According to the pigeon hole principle, we have 

m— 1 



in — ± / 

'(^)< E p ( 

fc=0 ^ 
m- 1 / 

= £*( 



(Id — A t At ) sgn(x)T 



> A 



/c=0 v 
m— 1 

fc=0 



I A T Id —A t At sgn(x)T 



2:1/,, 



> /3f 4 



S|Jlt (Id— A\At) sgn(x)T 



2M k 



where we have applied Markov's inequality in the last step. With r(-) denoting the function 
that rounds to the closest integer, we introduce 



M, := r 



rn 



k + 1 



for k = 0, . . . , m - 1, q k := 2M k (k + 1). 



Then 2m/3 < M fc (fc+1) < 4m/3 and therefore 4m/3 < q k < 8m/3 and also m/M k > 3(A;+l)/4. 
For some G (0, 1), we further set 

m 

Then with /3 = l/(5 4 / 3 ), we have Y^o 1 /3 fc < 1/4, so that we have found valid choices for the 
(3 k . The rest of the proof is a straightforward consequence of the following statement. 

Lemma 6.2. Let k, M £ N be given and set q = 2M(k + 1). // 



n 



then 



E 



a} At ( Id —A t At ) sgn(x)r 



2A1 



< Qq 



Qq\fs 



(6.13) 
(6.14) 
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Before we prove Lemma 6.2 let us first see how one can deduce Theorem 6.1 from it. 

800e 3 /\/^og (— ), 



Condition (6.5) implies 



n > 



e / 



which, according to the choice m = \2 log (6JV/e)] of m and the definition of q implies (6.13). 
Then (6.14) yields the series of inequalities 



m— 1 



2M k 



E 



k=0 





2M k ' 


J sgn(x) T 





m—l 



-2m 



k=0 



m—l 



<P~ 2m Yj 16m 



fc=0 



16mvs\ 3 



n 



< 16m 2 



16/3- 3 / 2 m y 



n 



With Eg denoting the events from (6.12), we further obtain, using (6.5) once more, 



< ie(iv- s ; 



m 



16/3 -3/2 m ^ N 



< 16(iV - s)m 2 e- m < -. 



n 



□ 



This finishes the proof of Theorem 6.1 

What remains is the following 
Proof of Lemma 6.2 So far, we have not used that the bounded orthonormal system underlying 
the random scattering matrix has the specific structure defined in (2.7). In what follows, we 
will use the letter f 6 Z 2 , possibly indexed further, to denote the rescaled positions (without 
the distance coordinate) on the resolution grid where the targets can be. We furthermore 
identify [N] with [\/iV] 2 in the canonical way, thereby recovering the square grid of resolution 
cells (recall that we set iV := Y1L/oIq\ 2 , where L > is the size of the target domain and do > 
denotes the meshsize of the resolution grid, so that V^V is actually the number of resolution 
cells along one axis of the square array). We fix I 6 T c and set := £ for h = 1, . . . , 2M. A 
lengthy but straightforward calculation gives with u := 27rdo/ (Xzq) 



cl}At ( Id — A? At) sgn(x)r 

n n 

E E 



2M 



1 



n 4M(k+l) 



I f W fW pT i=l V 7 

(2M) (2M) (2M) (2M) (2M) '„(2A/) 



.-(1) 

'1 



I (!) 



X (24) 



x exp z- 



2A/ 



E(-d 



P =i 

2M fc+1 

exp I iu; 




2Ai 



fc+1 

p=i /i=i 



P =i 



^=1 




(6.15) 
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In order to evaluate the above term, we will use combinatorial arguments inspired by [H [25] . 



To a given word [jj^ ) we associate the partition Q of + x [2M] with the property 



p=l,...,2M 



that (h,p) and (h',p') are in the same block if and only if = j$ ' . Analogously, we associate 



Ap') 



the partition 1Z to the word ( mif ) . To each Q G Q resp. R G 1Z there exists exactly 

' n / h=l,...,k+l 

M 



p=l,...,2M 



one j'q G {1, . . . , n} resp. rrtR G {1, . . . , n} such that = j'q for all (/i, p) £ Q resp. m, 
for all G i2. We define 

{(Q,R) eQxTZ:j Q = m R }, 

{Q G Q : there exists i? = -R(Q) S 7?. such that mRiQ\ = Jq} , 
{R G 7£ : there exists Q = Q(R) G Q such that jQtm = m^} . 

With this notation, we can write 



m R 



Qnn 

VP 



E 



2M fc+1 

^E ( ')"E 



p=i 



/i=i 



? (p) 



4", 



2M fc+1 

exp[iu,£(-l)^ 

p=l 



»(p) 
■h-l 



t<P) 



h=l 



--E 



xE 



xE 



n ex p 

QeQ\Q n 

n ex p 

Ren\n n 



*>{ E (-i) p (#V4 

\(h,p)eQ 

\(h,p)eR 



H exp I iw / 

Q£Q n V \(h,p)£Q 



,(P) 
■h-l 




l h ) ' °3Q 



W E ( 

\{h,p)eR{Q) 



,(p) 
h-l 



l {p) \ b 



m R(Q) 



Observe that 



E 



n 



exp ioj 



QeQ\Q r 



n m e 



QeS\S r 




where <5 is the Kronecker delta, that is 5(0) = 1 and 5(x) = for x ^ 0. Since i^ 1 ^ 

implies that each Q G Q \ Q n must contain at least two elements in order to provide a nonzero 



contribution to the overall expectation of the expression in (6.15). The same is true for each 



R G Tl \ TZ n . However, the blocks Q G Q n may contain just one element, since they have a 
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corresponding block R(Q) with matching index. Therefore, we can break the evaluation of the 
right hand side of (6.15) down to three basic questions. 



1. What are the numbers t\ resp. ti of the distinct indices appearing in the words w\ :- 

, (p) NP=l,..,2M ( (p) y=l,...,2A4 



^ ) h =i, M resp - W2 - V'° h A=w+i 

2. What is the number t of indices that the words w\ and W2 have in common? 

3. Given 1. and 2., which constraints must be fulfilled by the partitions Q and 1Z corre- 
sponding to wi and W2 } - 

In the following, we identify partitions of [k + 1] x [2M] with partitions of [2M{k + 1)] in 
the canonical way. Moreover, if we have a partition Q = {Q\, . . . , Qt, Qt+i, ■ ■ ■ > Qti}, we 
enumerate it without loss of generality such that Qt+i, • • ■ , Qt! are the blocks containing at 
least two elements and Qi, ■ ■ ■ ,Qt are the blocks which might contain just one element. The 
same is done for the partition 1Z = {Ri, . . . , Rt, Rt+i, • • • , Rt 2 }- We define 



£ := E 





2M" 


J sgn(x) T 





Using the triangle inequality and n > 2M(/c + l) implied by (6.13) together with the definitions 
from Subsection 16 . 1 1 we obtain 



£ < 



1 



E 



2M(fc+l) M(fc+l)+|t/2J Af (fc+l) + L*/ 2 J 

E E E 

t=0 t\=t t2=t JXf'tjti pw different 

mi,...,mi 2 pw different 
|{ji,->i«i}n{mi,...,mt 2 }|=t 

E E E 

QeV(2M(k+l),t 1 ,t 1 -t)1ZeV(2M(k+l),t2,t2-t) f' 1 ' l ...,fW l gT 



J2M) ' (2M) „ 
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E (-^(41-^)) 
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>< n me 
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xflM E (-i) p (^f-4 p) )+ E (-i) p (^i-^ 

-Ap) nip) 



(p) o(pT 

h 



(6.16) 
(6.17) 
(6.18) 



For the product U. Qe {Q t+u ..., Qtl } $ (£(h 1 p)eQ( _1 ) P (4-1 ~ *h ) ) to be nonzero, we must have 
£ (MeQ (-l) p U^-x ~ ^h) = for a11 Q e {Qt+i, ■ ■ ■ , Qti} , and analogously for the other 



two products appearing in (6.17), (6.18). Therefore, the expressions (6.16)-(6.18) give at least 
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t\ V ti := max{ti,t2} linearly independent constraints. Recalling that q = 2M(k + 1), this 
observation yields 



E 

-,4+i e T Qe i Q 



n « ( e (-d p (ft 

;+ i,...,Qt 1 } 



(p) _,(p) 



„(2M) ,(2M) „ 



n s( E (-ir(ft-4 p) )) 



x 

II 5 ( E (-!) p (ft - #>) + E (-^ (ft - ft ) ^ ^ lVt2 - 



x 



Using (6.4), we arrive at 



E E E E 
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where we have applied t\ < q in the last step. Putting these pieces together, we obtain 



8 < 




ti+t 2 



-tlVt 2 



ti+t 2 



g/2+L*/2j 

t 2 =ti 



h+t 2 



n 
3~Z 



-12 



Let us evaluate the inner sums in (6.19). Since n > (3/2)g by (6.13) we have 



q/2+\t/2\ 

E " 
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h+t 2 
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;i«) 2 e 



Similarly, using once more ( |6.13 ) in the form n > (3/2)q , y / s, we obtain 

ti-i / \ *i+*2 / r, \ ti 



-ti < 



n 



and 



9/2+LV2J 

E 



q/2+t/2 



n 



(§*)-, 



< 2 



rr 



Plugging everything into (6.19) finishes the proof of the lemma. 



6.3 Proof of Theorem 12711 



(6.19) 



□ 



According to Theorem 6.1 all conditions of Theorem 3.1 are satisfied with probability at least 
1 — e provided 

n 2 > Cslog 2 (cN/e), 



where C, c > are numerical constants satisfying the bounds claimed in Theorem 2.1 This 
concludes the proof. □ 



7 Numerical simulations 

7.1 Chambolle and Pock's iterative primal dual algorithm 

For the numerical simulations, we use Chambolle and Pock's primal dual algorithm [9] to com- 



pute the solution of (2.9) and (2.11 ). The algorithm is suited for a general convex optimization 
problem of the form 

min F(Ax) + G(x) (7.1) 



-<mx iV 



with A e 

functions. The dual problem to (|7.1|) is given by 



zed- 

(— oo, oo] and G 



(—00,00] lower semi-continuous convex 



max -F*(n-G*(-A*n, 



(7.2) 
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where F* , G* denote the convex conjugates of F, G. Here, we recall that the convex conjugate 
function F* : C m — > (— oo, oo] is defined as 

F*{y) := sup {Re ((*, y)) - F(x)} . 
xeC m 



In the cases of interest to us, strong duality holds, meaning that the optimal values of (7.1) 
and (7.2) coincide. For describing Chambolle and Pock's algorithm, we require the proximal 
mappings of F and G defined as 

Pg(t; z) := argmin <tG(x) H — \\x — z\ 
xeC N I 2 

and analogously for F. The iterative primal dual algorithm then reads as follows. We select 
parameters 6 G [0, 1], r, a > such that t<t||A||2->2 < 1 and initial vectors x° G C^,^ G C m , 



Then one iteratively computes 



n+1 



„n+l 



P F *(a;C + aAx n ) , 
P G {r-x n -TA*C +1 ) 



„n+l 



+ 6(x 



n+1 



X 



In [9] , it is shown that for the parameter choice 9 = 1 the algorithm converges in the sense that 
x n converges to the minimizer of the primal problem (7.1) and £ n converges to the solution of 
the dual problem (7.2) as n tends to oo. Moreover, [9] also gives an estimate of the convergence 
rate for a partial primal dual gap. 



7.2 The algorithm for ^-minimization 

Let us now specialize to the case of £i-minimization. We remark that to the best of our knowl- 
edge, Chambolle and Pock's algorithm has not yet been specialized to equality-constrained 
and noise-constrained £i-minimization before, so we provide the first numerical tests of the 
algorithm in this setup. 

Let us first consider the problem 

min Hxll-L subject to Ax = y. 



This is a special case of (7.1) with G(x) = || a? || x and F(z) = if z = y and oo otherwise. 
Straightforward computations show that for all £ G C m , C G C^, 



F*(£)=Re«£,y)), G*(0=Xb h J0 
P F (a;0=y, P F .((j;0=£-ffy. 



if HCIloo < 1, 
oo otherwise , 



The proximal mapping of G{x) = \\x\\ l can be evaluated coordinatewise, so that it is enough to 
compute the proximal of the modulus function |-| on C. The latter is given by the well-known 
soft-thresholding operator S T defined as 

n I \ r, I \ f 1 , ,2 , I il f Sgn(z)(|z| — f) if \z\ > T , 

SJz) := P\.\(t,z) = argmm^ -\x - zr + r\x \> = < ^ K Al 1 ; A . 
w l|V ' ' e c I 2 1 I otherwise, 
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so that 

Pg(t,z)i = S t (zi), £e[N]. (7.3) 

With these computations at hand, the algorithm for noise-free ^-minimization is given by the 
iterations 

x n+1 =S T {x n -tA*C +1 ) , 
x n+1 = x n+1 + 9(x n+1 - x n ) . 

In the noisy case, we aim at solving 



min llxlli subject to \\Ax — y\\ < ri. 



In this setup, G(x) = \\x\\^ and 

F(z) = XB(y, v ){z) 



if \\z - y\\ 2 < V , 
oo otherwise . 



Carrying out analogous computations as in the noise-free case, we find that the corresponding 
algorithm for the noisy case consists in iteratively computing 

f if \\<r- l i n + Ax n -y\\ 2 < V , 
Z n+1 = I (1 - ^ r^r) (C n + a{Ax n - y)) otherwise , 

x n+1 =S T {x n -tA*C +1 ) , 
x n+1 = x n+l + 9(x n+1 - x n ) . 



7.3 Numerical results 



We apply the above algorithm for ^-minimization to the sensing matrices given by (2.7). We 
choose the wavelength A = 0.03 m, the resolution do = 10 m, the distance zq = 10000 m and 
the size of the aperture B = 30m. Note that in this scenario, we have dnB/(\ zn) = 1. To 



speed up the algorithm, we exploit the fact that the matrix A S C n xN from (2.7) can be 
factorized into a product of diagonal matrices and a nonequispaced Fourier matrix. In fact, 
assuming a square resolution grid, we can write the grid parameter as double index (I, £) with 
£, £ G [Ni] where Nf = N. For j, k G [n] and ctj = (£,j,rjj)i a k = f]k) we then have 



(Az) jt = exp (j^- (jKd.vM + \\{(i,Vt)\l 



E 



exp I -2„ ( «,<), I '&±&, )) exp 



B ' B I J \ Xzq 



Since the nonequispaced Fourier transform can be implemented at computational costs that 
are only slightly larger than that of the Fast Fourier Transform, it gives rise to fast approximate 
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matrix-vector multiplication algorithms, see [24J and reference therein. We use an implemen- 
tation of S. Kunis, which can be found in the Matlab toolbox associated to the paper |20j. The 
algorithm is run with the renormalized matrix A = ^= A and the parameter choices 9 = 1, 
<7 = 1 and r = 0.5. For fixed sparsity s, we generate a random vector in the following way: 
We choose the support set uniformly at random, then we sample a Steinhaus vector on this 
support and multiply its nonzero entries independently by a dynamic range coefficient uni- 
formly distributed on [1, 10]. With a fixed number of resolution cells, we vary the number n of 
antennas and compute empirical recovery rates by choosing the n antenna positions uniformly 
at random from the domain [— B/2, B/2] 2 , where we leave the vector to be recovered fixed for 
the whole period. 




sparsity = 1 00 _ 

# resolution cells = 6400 

# trials = 100 



nber of antennas 



(a) 











f 


sparsity = 100 






# resolution cells = 16900 






# trials = 100 


\ J 







£•> 0.7 
<D 

8 0.6 



° 0.5 



number of samples 

(b) 



Fig. 3 Empirical recovery rates for fixed sparsity s = 100 and varying number n of antennas: 
(a) N = 6400 resolution cells (b) ./V = 16900 resolution cells 



With the resulting noise-free measurement vector y we compute the ^i-minimizer with 
Chambolle and Pock's algorithm (which takes about 300 iterations), and we record whether the 
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original vector is recovered (up to numerical errors of at most 10~ 3 measured in the ^-norm). 
Repeating this test 100 times for each choice of parameters (s, n, N) provides an empirical 
estimate of the success probability. In Figure [3| we display the result of noiseless recovery 
for fixed sparsity s = 100 and for N = 6400 respectively N = 16900 resolution cells. The 
transition from the unsuccessful regime to the successful regime occurs at about 28 antennas, 
corresponding to 784 measurements, for N = 6400, so in practice, the algorithm works even 
better than predicted by our theoretical results. In the situation with more resolution cells, 
the transition occurs at a slightly increased number of antennas. The illustration in Figure [3] 
was produced with the version of the algorithm for equality constrained £i-minimization. 
To test the robustness of our recovery scheme with respect to noise, we compute receiver oper- 
ating characteristic curves for various parameter choices, see [28, Chapter 6] and [23, Chapter 
II. D], using the noise-constrained version of Chambolle and Pock's algorithm algorithm. We 
start by simulating a target vector x £ c 6400 with ||x|| = 100, that is we simulate 100 targets 
in 6400 resolution cells. We do this as described above, that is we select the support uniformly 
at random, then simulate random phases on the support and multiply them independently by 
a dynamic range coefficient uniformly distributed on [1, 10]. We then leave the vector x fixed, 
draw a realization of our random scattering matrix A and run noise constrained basis pursuit 
with the noisy measurements y = Ax + e, where e is a complex Gaussian noise vector. The 
entries of the recovered solution x are then compared to a threshold r > 0. If \xfc\ < r, then it 
is set to zero, otherwise it remains unchanged. We then count how many of the actual targets 
in x are detected. The detection probability is the number of detections divided by the true 
number of targets, in our case 100. Moreover, we count the number of false alarms, that is the 
number of positions k 6 [6400] where x^ ^ but x^ = 0. The false alarm probability is the 
number of false alarms divided by the number of scatterers. For fixed x and r, we repeat this 
a 100 times and compute the empirical probability of detection P^ and the probability of false 
alarm Pf. This is then again repeated for varying values of the threshold t, resulting in a plot 
of Pd versus Pf, which is called the receiver operating characteristic curve. 




probability of false alarm P f 



Fig. 4 ROC-curves for a fixed 100-sparse vector x in 6400 resolution cells 

In Figure [4j the results of the simulation are depicted. We see that if we choose the number 
of antennas at the critical value 28 observed in Figure [3j then we get a significant number of 
missed targets and false alarms. If we however slightly increase the number of antennas, we 
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get almost perfect detection and virtually no false alarms if we choose the threshold correctly, 
in our CcisG £is t 0.5. So our recovery scheme is in fact very robust with respect to noise in 
the sense that the support is very well recovered. However, the quality of the approximation 
of the true reflectivities decreases with the SNR, as is to be expected. 
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